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Abstract 

With the development of numbers of high resolution data acquisition systems and the global re- 
quirement to lower the energy consumption, the development of efficient sensing techniques becomes 
critical. Recently, Compressed Sampling (CS) techniques, which exploit the sparsity of signals, have 
allowed to reconstruct signal and images with less measurements than the traditional Nyquist sensing 
approach. However, multichannel signals like Hyperspectral images (HSI) have additional structures, like 
inter-channel correlations, that are not taken into account in the classical CS scheme. 

In this paper we exploit the linear mixture of sources model, that is the assumption that the mul- 
tichannel signal is composed of a linear combination of sources, each of them having its own spectral 
signature, and propose new sampling schemes exploiting this model to considerably decrease the number 
of measurements needed for the acquisition and source separation. Moreover, we give theoretical lower 
bounds on the number of measurements required to perform reconstruction of both the multichannel 
signal and its sources. We also proposed optimization algorithms and extensive experimentation on our 
target application which is HSI, and show that our approach recovers HSI with far less measurements 
and computational effort than traditional CS approaches. 
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Compressed sensing, source separation, hyperspectral image, linear mixture model, sparsity, 
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I. Introduction 

A Hyperspectral Image (HSI) is a collection of hundreds of images that have been acquired simultane- 
ously in narrow and adjacent spectral bands, typically by airborne sensors. HSI are produced by expensive 
spectrometers that sample the light reflected from a two-dimensional area. An HSI data set is thus a "cube" 
with two spatial and one spectral dimensions. Hyperspectral imagery has many applications including 
environmental monitoring, agriculture planning or mineral exploration. The plurality of channels in HSI 
makes it possible to discriminate among the various materials that make up a geographical area: each of 
them is represented by a unique spectral signature. Accordingly, HSI are often processed via clustering 
or source separation methods to obtain segmentation maps locating and labeling the various materials 
appearing in the image. Unfortunately, having multiple channels comes at a price: the sheer volume 
of data makes acquisition, transmission, storage and analysis of HSI computationally very challenging. 
Therefore, the problem addressed in this paper is to reduce the complexity of manipulating HSI via a 
suitable compression or dimensionality reduction technique. 

In this context the emerging Compressive sensing (CS) theory, which addresses the problem of recover- 
ing signals from few linear measurements, seems ideally suited ||2|. The main assumption underlying 
CS is that the signal is sparse or compressible when expressed in a convenient basis. A signal x G 
is said fc-sparse in a basis if it is a linear combination of only k basis vectors of The signal x 
is said sparse when k <^ n and compressible if the coefficient's magnitudes, when sorted, have a fast 
power-law decay, meaning that the signal has few large coefficients and many small coefficients. The 
recent literature abounds with examples of sparse models for signals and images. 

While the CS-community has mostly focused on ID or 2D signals, few works have been done on 
higher dimensional signals, in particular multi-array signals such as HSI. Extensions of wavelets basis 
for 3D data have been proposed |3| and rather generic sparse models have been exploited in |[5| 
for designing innovative compressive hyperspectral imagers. However, multi-array signals such as HSI 
have usually some structures that go beyond the sparsity assumption. Indeed, HSI can be interpreted 
as a mixture of sources, each of them having a specific spectral signature. This model is widely used 
for unmixing HSI [[6|-[[T0|, that is extracting, form the HSI, each source and their respective spectral 
signatures. 

The main focus of this paper is to exploit, beyond the sparsity assumption, an additional structured 
model, the linear mixture model, so as to reconstruct and separate the sources of multi-array signals 
assuming we know their spectra (or mixing parameters) as side information. Note that this hypothesis is 
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validated in many applications where the elements or materials composing the data are known and their 
spectra tabulated. This idea was first introduced in two of our conference papers [ [TT| , [ [T2| . In this paper, 
we introduce and analyze a new sampling scheme, which exploits this structured model, and that has the 
following important properties: 

• the number of measurements, or samples, does not scale with the number of channels, 

• the recovery results do not depend on the conditioning of the mixing matrix (as long as the mixing 
spectra are linearly independent). 

We propose new algorithms for HSI compressive source separation (CSS), that is source separation and 
data reconstruction from compressed measurements, which are based on exploiting the linear mixture 
structure and TV, ii or Iq regularization. We establish that sources can be efficiently separated directly on 
the compressed measurements, i.e avoiding to run a source separation algorithm on this high-dimensional 
raw data, thereby eliminating this important bottleneck and providing a rather striking example of 
compressed domain data processing. We provide theoretical guaranties and intensive experiments which 
show that, with this approach, we can reconstruct a multi-array signal from compressed measurements 
with a far better accuracy than traditional CS approaches. For example, we are able to reconstruct HSI 
datasets with only 3% relative error from 3% of measurements and less than l%o of data transmission, 
with an algorithm that is more than 40 times faster. While the main target application of this paper is 
HSI, our model and the theoretical analysis is general and could be applied to other multi-array signals 
like e.g. Positive Emission Tomography (PET) or distributed sensing. 

The remainder of this paper is structured as follows. The necessary background and notations are 
first introduced in Section [ll| We then propose, in Section |ni| two acquisition schemes that exploit 
the prior knowledge of the mixing parameters so as to perform a decorrelation step. In Section |IV| 
we provide theoretical guarantees for both source identification and data reconstruction. We determine 
the number of CS measurements sufficient for robust source identification and signal reconstruction as a 
function of the sparsity of the sources, sampling SNR and the conditioning of their corresponding mixture 
parameters. In Section |V| we discuss in further details the application of our acquisition and recovery 
schemes for HSI. We introduce different recovery algorithms that we compare with the classical methods, 
for various CS acquisition schemes on two sets of HSI. Finally, in the spirit of reproducible research, 
the code and data needed to reproduce the experimental sections of this paper is openly available at 



http://infoscience.epfl.ch/record/l 809 1 Ij 
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II. Background and Notations 

A. CS of Multichannel Signals 

We represent a multichannel signal with a matrix X G MJ^^^^^ where 112 is the number of channels 
and ni is the dimension of signal in each channel. The CS acquisition protocol of a multichannel signal 
X is a linear mapping A : W^^^^'^ of X into a CS measurement vector y G contaminated by 

the measurement noise z G R^: 

y = A{X) + Z. 

When m <C nin2 the signal is effectively compressed. The main goal of CS is to recover the signal 
X from the fewest amount of measurements m. Note that any linear mapping A{X) can be written in 
matrix form AXyec •= A{X), where A G R^x^i^2 ^j^j Xyec ^ M"^'"^' is the vectorized form of matrix 

^ = AXyec + (1) 

In order to recover Xyec, we can search for the sparsest vector Xyec which is consistent with the 
measurement error, leading to the following ^o-n^inimization problem: 

argmin ||X^ec|ko <5-^- \\y - ^Xyech < (2) 

where e is an upper bound on the norm of the noise vector (i.e. ||2:||2 < s), \\-\\io denotes the io quasi-norm 
of a vector (i.e., the number of its nonzero coefficients). Unfortunately, this combinatorial minimization 
problem is NP-hard in general [ [T3| , [ [T4| . However, there are two tractable alternatives to solve problem 
([2]): The convex relaxation leading to ^1 -minimization and greedy algorithms such as matching pursuits 
(MP) [ [T3| or Iterative Hard Thresholding (IHT) |fT5|. Both types of approaches provide conditions on 
the matrix A and on the sparsity k such that the recovered solution coincides with the original signal 
Xyec, and consequently also with the solution of Q. 

The £1 minimization approach consists in solving the following non-smooth convex optimization 
problem called Basis Pursuit DeNoising (BPDN): 

argmin ||X^ec||i s.t. \\y - AXyech < (3) 

where || • ||i denotes the ii norm, which is equal to the sum of the absolute values of the vector entries, 
II • II 2 denotes the £2 or Euclidean norm. 

It has been shown in |[T|, |[2|, [ |T6| that approximating the sparse recovery problem by the ii minimiza- 
tion ([3]) can stably recover the A:n2-sparse original solution (i.e. fc-sparse signal per channel) whenever A 
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satisfies the so-called restricted isometry property (RIP). This result guarantees that sparse signals can be 
perfectly recovered from noise-free measurements and that the recovery process is robust to the presence 
of noise. The computation of the isometry constants for a given matrix is prohibitive in practice, but 
certain classes of matrices, such as matrices with independent Gaussian or Bernoulli entries, obey the 
RIP condition with high probability (see Theorem 5.2 in |17]) as long as: 

m > cn2k\og{ni/k). (4) 

for a fixed constant c. 



B. Sparse Regularization of a Multichannel Signal 

Usually the data X^ec is not directly sparse, but sparse in a basis E R^i^sxmns jj^ ^^^^ c^sq^ the 
^1 regularization approach consists in solving the following problem which generalizes problem ([3]): 

argmin ||e^ec||i s.t. \\y - A^Q^^ch < (5) 

with X^ec = ^Qvec Stable reconstruction by solving problem ([5]) is guaranteed as long as the A"^ matrix 
satisfies the RIP. When the data is a multichannel image, a classical basis is a block diagonal orthonormal 
basis ^ = Idn^ ®^^2d Q where ^^2d ^ MP^^^^ denotes a proper 2-dimensional wavelet basis. 

Another classical approach to regularize the data (specially images) is the total variation (TV) penalty 



flSf , which tends to generate images with piecewise smooth regions and sharp boundaries. Replacing the 
^1 norm with the TV norm on each channel Xj of the multichannel in problem ([5]) leads to the Total 
Variation De-Noising (TVDN) problem: 

argmin W^jWrv s.t. \\y - AXyech ^ ^- (6) 

^ u 

C. The Linear Mixture Model 

One of the most practical setups of a multichannel signal is when the multichannel data matrix X is 
derived by a sparse linear mixture model as follows: 

X = SH^. (7) 

Here, S E R^i><^ denotes the source matrix whose zth column contains the proportion of the source i at 
each pixel. Each source is mixed with the corresponding column of the mixing matrix H E R^^xp order 

^Idn2 is the 712 X 712 identity matrix and denotes the matrix Kronecker product. 
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to generate the full multichannel data. Each column of H contains the spectrum of the corresponding 
source. The observed signal in any channel j G {1, ... ,77.2} is thus a linear combination of p source 
signals: 

p 

i=l 

D. Mixing Parameters as Side Information for Multichannel CS Recovery 

In certain multichannel signal acquisition setups the mixing parameters H are known at both decoder 
and encoder sides. In particular, this is the case in many remote sensing applications where the spectra of 
common materials are tabulated. Such prior efficiently restricts the degrees of freedom of the entire data 
matrix to the sparse coefficients of the underlying sources. Indeed, we will show that, when we know the 
mixing parameters H, the inverse problem consisting in recovering the multichannel signal X from the 
measurements in ([T]) is equivalent to the problem of recovering the sources S^ec ^^om the following 
measurements: 

y = A$S^ec + (8) 

with $ = H ® Id^^ . The source coefficients can then be recovered by solving a convex optimization 
problem such as ([5]), where A is replaced by and the multichannel signal can be reconstructed 
by applying the mixing matrix to the recovered source matrix according to the linear mixture model 
([7]). This approach has the advantage of solving two problems: i) source separation directly from the 
compressive measurements, ii) data compressive sampling via source separation or, equivalently, via a 
particular structured sparse model. 

III. Compressive Multichannel Signal Acquisition Schemes 

If the multichannel signal follows the linear mixture model (|7]), the knowledge of the mixing matrix 
can be used efficiently. The sparse source coefficients can be directly recovered from the measurements. 
In this section we introduce a decorrelation mechanism, applied at the acquisition process or as a post- 
processing step, which has two main advantages: first it leads to strong dimensionality reduction and 
secondly it improves the conditioning of the recovery problem. 
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A. Multichannel Recovery via Source Recovery 

When we know the mixing matrix H, and thanks to the property ({BCD)yec = {D^ ® B)Cyec) of 
the Kronecker product, the sampUng equation ([T]) (in the noise free case) can be written as: 

AX.ec = ^(SH^),ec = A (H Id, J S,ec = A^S.^c- (9) 

" V ' 

Then, the ii regularization approach for the recovery of the whole data consists in finding the sparsest 
coefficients vector @vec ^ ^^^^ of the sources vector S^ec = *0i;ec in a basis * G R^^i^^^i, where 
e.g. * = Idp 0^^2D is a block diagonal orthonormal basis, through the following minimization: 

argmin ||0^ec||i s.t. \\y - A^'^Gyech < (10) 

This corresponds to a "synthesis" formulation of BPDN using a basis The "analysis" formulation, 
which is equivalent to the synthesis one when * is a basis but different when * is a redundant dictionary, 
consists in solving the following problem with respect to the sources instead of its coefficients: 

argmin ||**S^ec||i s.t. \\y - A^Syech < £^ (H) 

where is the adjoint of the operator 

The data X can then be recovered via the mixture model X = SH^, with S^ec being either the 
solution of the analysis problem ( [TT] ) or S^ec being equal to S^ec = "^^vec with 0^ec. solution of the 
synthesis problem ([TO]). 



B. Decorrelation Scheme 



We have seen in section |II-A[ that the conditions to recover the signal from the noisy measurements 
y = AXyec + ^ depend on properties (such as RIP) of the sensing matrix A. We introduce a particular 
structure for the sampling matrix A which benefits from the available knowledge of the mixture parameters 
H and incorporates data decorrelation into the compressive acquisition. 

1) Decorrelating Multichannel CS Acquisition: The decorrelation mechanism consists of applying 
the Moore-Penrose pseudo inverse matrix H^^ = (H^H)~^H^ in order to remove the underlying 
dependencies among CS measurements. We therefore propose the following sampling matrix: 

A^H^^A. (12) 

The main sampling matrix is generated from a smaller-size m x ni core sampling matrix A. Note that 
CS imposes m <C ni. 
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The total number of measurements is m = pm. Applying the sampling matrix A of ([12]) on multi- 
channel data results in the following CS measurements: 

y = A^S^ec + z (13) 

= (Ht I) (H Id^, ) S^ec + 

^ V V ' 

A ^ 

= (Idp®I) S^ec + ^. (14) 

^ V ' 

-A 

The third equality comes from the following property: {B ® C){D F) = BD ® CF, and Ap is a 
block diagonal matrix whose p diagonal blocks are populated with A: Ap = Idp A. 

As we can observe in ([14]) and thanks to the specific structure of the sampling matrix, the mixing 
parameters H are discarded from the formulation and each source (each column of S) is directly 
subsampled by the same matrix A. 

2) Uniform Multichannel CS Acquisition: In many practical setups the acquisition scheme can not be 
arbitrarily chosen and is rather determined by various constraints posed by the physics of the signals 
and the implementation technology. Certain acquisition systems such as Rice's single-pixel hyperspectral 
imager Q are using a universal random matrix to sample independently data in each channel. In this 
case, acquisition models such as ( [T2] ), which require inter-channel interactions for compressed sampling, 
simply cannot be implemented. Here, the sampling matrix A in ([T]) is block diagonal with 77.2 blocks 
(each applies on a certain channel) that are populated by a unique fh x ni matrix (similarly as A in 
([H)): 

A = An,=ldn,®A. (15) 

The total number of measurements is then m ^ 712 fh. Reshaping y and z correspondingly into fh x 
712 matrices Y (the measurement matrix) and Z (the noise matrix) leads to the following equivalent 
formulation: 

Y ^AX + Z. 

3) Decorrelation-based Uniform Sampling: A decorrelation step similar to the one introduced in 



Section III-B 1 can be applied on the CS measurements. It consists in multiplying the rows of the 



measurement matrix by (H^^)^ and reducing the dimensionality of y to an m x p matrix as follows: 

y* = y(Ht)^ 
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where, = Z{¥L'^)^. By reshaping y* and into the vectors ?/* and z*, we can observe that the 
outcome of such decorrelation-based uniform sampling leads to an expression similar to ([14]) i.e., 

7/* =IpS^ec + ^*. (16) 

This decorrelating scheme favorably reduces the dimension of the data: at the acquisition stage, the 
total number of samples is n2 fa but at the transmission and decoding stages the number of samples is 
only pfh <^n2fh. 



For the decorrelating sampling schemes described in section III-B 1 and III-B3 the minimization 



(e.g. the "synthesis" problem ([TO])) of section [Hl-AI takes the following form: 



argmin ||0^ec||i s.t. \\y - Ap'^Q^ech < (17) 
which, in the noiseless case can be decoupled into p independent minimizations, each of them 



corresponding to a certain source compressed by a universal matrix A. In Section |rV] we provide the 
theoretical analysis of such recovery scheme for various acquisition schemes. 

IV. Main Theoretical Analysis 

Compressive sparse source recovery is closely related to the problem of compressed sensing with 
redundant dictionaries [ |T9| , pO| . Indeed, the later problem has the same formulation as in ([10]) by 
replacing $ by an overcomplete dictionary matrix. The first part of this section provides an overview of 
the CS literature on redundant dictionaries. In the second part, we derive new performance bounds that 
extend the former results for a larger class of dictionaries. In the third part, we cast the sparse source 
separation problem as a particular case of CS recovery using redundant dictionaries and we give a bound 
on the performance of the li minimization for each of the considered CS acquisition schemes (dense, 
uniform and decorrelated). 



A. Compressed Sensing and Redundant Dictionaries 

Let X G R"^ be a vector that is sparse in a dictionary T> ^W^^"^ {i.e., x ^T>9 with, 9 G W^). The 
minimization approach for recovering 9 (equivalently x) from the compressive measurements y = Ax + z 
consists in solving: 

argmin ||6>||i s.t. \\y-AT>9\\2<e, (18) 
e 

where, ||2:||2 < e. Note that in this section A is a sampling matrix of size m x n and the dictionary D 
typically contains a large number of columns (d:^ n). 
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It has been shown in |[T|, |[2| that the ii minimization ([18]) can stably recover the original solution 



whenever AD satisfies the restricted isometry property (RIP). More precisely, if for all fc-sparse vectors 
6 the following RIP property holds: 

(l - 4(^D)) ll^lli < \\AT^e\\l < (l + 4(^D)) ||^||2 (19) 

with the RIP constant of order fc, 5k{AY>) < \/2 — 1, then the solution 9 to ( fTS) ) satisfies the following 
error bound: 

11^ - ^Ib < CO 11^ - 9k\\i + cie, (20) 
for some positive constants cq, ci, and where 9^ is the best fc-sparse approximation of 9. Now the question 



is how many CS measurements are sufficient so that AT> satisfies the RIP ? It has been shown in [ [T9| that, 
for a certain class of random sampling matrices A (e.g., with i.i.d. Gaussian, Bernoulli or subgaussian 
elements), with very high probability the RIP constant SkiAD) is bounded by: 

4(AD) < Sk{A) + 4(D) + 4(^)4(D). (21) 

If D is an orthonormal basis, then Sk(D) = and AT> becomes another subgaussian matrix with a 
similar distribution as for A and thus ( [2T] ) holds with equality i.e., SkiAT)) = Sk{A). 

Considering the recovery condition using £i minimization (i.e., SkiAT)) < \/2 — 1) and the bound in 



pT] ), we can conclude that A must satisfy RIP with the following constant: 



, ^/2-l-4(D) .... 

^'^^^ - 1 + 4(D) • ^^^^ 



Moreover, using the Johnson-Lindenstrauss lemma, it has been shown that (see Theorem 5.2 in |T7| |) 
a random matrix A whose elements are drawn independently at random from Gaussian, Bernoulli or 
subgaussian distributions satisfies RIP as long as we have: 

m > cklog(n/k), (23) 

for a constant c depending on the RIP constant of A i.e., the higher 6k{A), the smaller c. If D is not a 
unitary matrix, 6^(1)) becomes a positive constant and the more coherent the columns of D, the larger 
its RIP constant. Therefore, there is a tradeoff for compressed sensing using redundant dictionaries: 
redundancy can result in a more compact representations of the signals i.e., smaller k, and thus less 



measurements are required for CS recovery using ( [18] ). Meanwhile, too much redundancy can lead to 



an awfully large constant in ( [23] ) implying that more CS measurements are required to overcome the 
uncertainties brought by over completeness. 
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B. Performance Bounds for Compressed Sensing using Asymmetric -RIP Dictionaries 



In Section |IV-C| we will show that applying the classical RIP based analysis results in conditions that 
are too restrictive to guaranty the source recovery. Therefore in this part and in order to overcome such 
limitations, we derive a new theoretical performance bound that uses different notions of RIP. We begin 
by introducing the notions of the asymmetric restricted isometry property (A-RIP) and the restricted 
condition number of a dictionary D. 

Definition 1. For a positive integer k , an nxd matrix D satisfies the asymmetric restricted isometry 
property, if for all k- sparse x G the following inequalities hold: 

A(D)||x||2 < ||Dx||2 < ZYfc(D)||x||2, (24) 

where, Ck(D) and Uk(D) are correspondingly the largest and the smallest constants for which the 
inequalities above hold. The restricted condition number of D is defined as: 

In addition, we use a different notion of RIP for the compression matrix A, namely, the Dictionary 
Restricted Isometry Property (D-RIP), proposed by Candes et al. in |2p|: 

Definition 2. For a positive integer k gN, a matrix A satisfies the D-RIP adapted to a dictionary D as 
long as for all k-sparse vectors x the following inequalities hold: 

(1 - 6l)\\Dx\\l < \\ABx\\l < (1 + 6l)\\Bx\\l (26) 

The D-RIP constant 6^ is the smallest constant for which the property above holds. 

This definition extends the classical RIP (which deals with signals that are sparse in the canonical 
basis) to linear mappings that are able to stably embed all low dimensional subspaces spanned by every 
k columns of a redundant dictionary D. 

As in f20l, we suppose that A is an m x n matrix drawn at random from certain distributions that 
satisfy the following concentration bound for any vector x: 

Pr ( |px||2 - IIXII2I > ^ll^lli) < Cexp(-cm) , (27) 

for some constants C and c > that are only depending on t. Then, A will satisfy the D-RIP for any 
n X d dictionary D with overwhelming probability if 

m>0{klog{d/k)). 
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Remark 1. Matrices A E W^^^ whose elements are independently drawn at random from Gaussian, 
Bernoulli or (in general) subgaussian distributions satisfy the concentration bound in ([27]) and therefore 
satisfy D -RIP for any n x d dictionary as long as 0(k\og(d/k)). 

Based on these definitions we establish the following theorem in order to bound the performance of 
the ii minimization in ([TS]): 



Theorem 1. Given a matrix A that satisfies the D-RIP adapted to a dictionary D, with the constant 
^jk < 1/3 where 7 > 1 + 2^^^(D), then the solution 9 to ( [TS] ) obeys the following bound: 

11^ - ^Ib < Co _ ^^11^ ^ ^/^^^ (28) 

for some positive constants Cq, c'l. 

The proof of this theorem is given in Appendix. Using Remark[T} the following result is straightforward: 

Corollary 1. For A whose elements are drawn independently at random from Gaussian, Bernoulli 
or subgaussian distributions, the solution to ( [TS] ) obeys the error bound ( [28] ) with an overwhelming 
probability and for any dictionary with a finite Restricted Condition Number ^^/^(D), if 

m>jk log {d/jk). (29) 



Comparing to the bound ( [23] ) based on the classical RIP analysis, we see that ( [29] ) features the same 
scaling-order for the number of measurements. In addition, for both types of analysis the constant factors 
grow as the atoms of the dictionary become more coherent and therefore, more CS measurement are 
required. 

Note that this result requires neither AT> nor the dictionary D to satisfy the classical RIR In the next 
section, we apply these results to guaranty the performance of the ii minimization approach ( [TO] ) for 
source identification and in particular, for the case where H is not well-conditioned. 



C. Theoretical Guaranties for Source Recovery using £1 Minimization 

Sparse source recovery from compressive measurements using ii minimization ( [TO] ) is a particular case 
of the compressed sensing problem using dictionaries ( [TS] ). Indeed, for the source recovery problem, 9 
and the dictionary matrix D are replaced respectively with Qyec and = = (H ® Id^J*, and 
consequently, n = nin2 and d = pni. The only difference here is that is a tall matrix (i.e., d < n) 
due to its specific construction and the assumption of having few number of sources (i.e., p < 712). 
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Though there is no redundancy in in terms of the number of columns, there is uncertainty at the 
sparse decoder because of coherent columns. The following lemma which has been proven in |[3l (see 
Lemma 2 in [3 |) shows that the conditioning of ^' is directly related to the conditioning of the underlying 
mixture parameters i.e., intuitively, if the columns of H become coherent, so become the columns of $^ 

Lemma 1. For matrices Vi,V2,---,V£ with restricted isometry constants 5/c(Vi), 5/e(Vi ),..., ^/^(V^) 
respectively, we have: 

t 

Since the RIP constant of any orthonormal basis is zero {e.g., 5/e(Id^J = 0), and since * is an 
orthogonal matrix, we can deduce the following bound on the RIP constant of = (H ® Id^J^^ by 
applying Lemma [T| 

< Skin) (31) 

< r? ^ max (l - a^iJH), al^CH) - l). (32) 

For k < p one can use ( [3T] ) (which then holds with equality), and more generally ([32]) for any k. Note 
that ([32]) follows by the definition of the RIP constant and it only holds if H is properly normalized so 
that 1 < a^ax(H) < 2 and < a^in(H) < l.Q 

Moreover, due to the properties of the extreme singular values of the Kronecker product of two matrices: 

(^1) 

— ^min ^min (^2), 

and according to Definition [T] we can bound the restricted condition number of as follows: 

^($0 < = ^^^^ ^ e(H), (33) 

where, ^(.) (without subscript) denotes the standard definition of the condition number of a matrix. With 
those descriptions, the performance of the sparse source recovery using ([TOj) can be easily characterized 



by any of the previous types of performance bound of sections IV-A and IV-B 



^This can be done by dividing H and multiplying S by (crmax(H) + crmin(H)) /2, respectively. 
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According to the standard definition of the RIP for the matrix we can bound its restricted condition 
number ^^(^0 follows: 

6(^0 < 



'1 + 4(^0 



1 - 4(^0 



Recall that, the classical RIP based analysis in section IV-A requires 5/c(^0 < — 1 (in order to 
have 6k{A) > in (|22])), which implies C/c(^0 < \/\/2 + 1, or consequently ^(H) < \/\/2 + 1. This 
severely restricts the application of such analysis for a limited class of relatively well-conditioned mixture 
parameters. 

To address this limitation, we use the second theoretical analysis based on the D-RIP of the compression 
matrix presented in section |IV-B[ The following theorem is a corollary of Theorem [TJ 



Theorem 2. Given a mixture matrix H whose condition number is C(H), and a matrix A that satisfies 
the D-RIP adapted /(9 H ® Id^^ with the constant 5*,^ < 1/3 where 7' = 1 + 2^^(H), then the solution 
@vec to ([To]) obeys the following bound for the same constants Cq^c[ as in ([28]); 



\\®vec - ^vech < 4^ ^^^W^vec " {®vec)k\\l + c[e. (34) 

Comparing to Theorem [1} D is replaced by and 7 is set to 7' which satisfies the requirement of 
Theorem |l] /.e., according to ( [33] ) we have y > 1 + 2^2 ^(H). As we can see, this analysis is valid for 
a much wider range of condition number namely, C(H) < jsj 

Now, if we use this approximation to recover the multichannel data i.e., X = SH^, the reconstruction 
error can be bounded using ( [34] ) and the following inequality: 

< amax(H)||S - S||f 

= a^ax(H)||0-0||i.. (35) 

Theorem [2] indicates 5*,^ < 1/3 as the sufficient condition for the sparse source recovery. In the 
following we investigate the implication of this condition for the previously mentioned acquisition 
schemes to bound the number of CS measurements. 

^As for jk> nin2 an nin2 x nin2 identity matrix A always satisfies dyj^ = (i.e. there is no advantage by replacing the 
full Nyquist sampling with CS), Theorem [2] becomes useful only when we have < nin2 which for the value of 7^ in the 
theorem implies ^(H) < ^ ^1^2^-1 ^ 



August 23, 2012 



DRAFT 



15 



1) Dense Random Sampling: Assume the compression matrix A that is used for subsampUng data 
in ([T|) is an m X nin2 matrix whose elements are drawn independently at random from the Gaussian, 
Bernoulli or subgaussian distributions. According to Remark [1} such matrices satisfy D-RIP adapted to 
$ (with the constant 5*,^ < 1/3) provided by: 

m > -i'k \og{pni/-i'k)). (36) 

2) Uniform Random Sampling: The same type of analysis indicates a very poor performance for the 
uniform random acquisition scheme described in section |III-B2[ The corresponding sampling matrix has 
a block-diagonal form A = Idn^ ® A. Here, we assume that the core compression matrix A that separately 
applies to each channel is an m x ni matrix whose elements are drawn independently at random from 
Gaussian, Bernoulli or subgaussian distributions. 



According to the theoretical analysis provided in section |IV-A[ the sufficient condition for source 
recovery via ( [T0| ) is 5k{A) < ^ which, by considering ([32]) can be rephrased as: 

For a compression matrix with this structure and by using Lemma [l] we can deduce 5/e(^) ^ ^/c(^)- 
Now similarly as for the bound ( [23] ), A satisfies the RIP with the constant above (and so does Al) as 
long as m > c/c log(ni//c)) or equivalently, 

m > cn2 k log{ni/k)). (37) 

The constant c depends on the conditioning of the mixture matrix H. When the columns of H are 
very coherent, the extreme singular values spread away from each other and 77 becomes large. As a 
consequence, A (or equivalently A) must satisfy RIP for a smaller constant which, as discussed earlier 
in section |IV-A[ implies c to be large and more CS measurements are required for source recovery. 

3) Decorrelating Random Sampling: When a decorrelation step is incorporated into the compressive 
acquisition process, H is discarded in the recovery formulation, and then we can use the standard RIP 
analysis in ||2]| to evaluate the source recovery performance. Therefore, if A = Idp^A satisfies the 
RIP with a constant 5k{A) < \f2 — 1, then the solution to ( [TT] ) obeys the following error bound: 



where the constants cq, c\ are the same as in ( [20] ). 

Now, since A is a block diagonal matrix, we can proceed along the exact same steps as for the 



uniform sampling scheme (Section IV-C2 ) to bound the minimum number of CS measurements such that 
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CS Acquisition Scheme 


Dense 


Dense 


Uniform 


Decorrelating 


CS Recovery Approach 


BPDN 


SS-^i 


SS-^i 


SS-^i 


CS measurements m > 


C>(n2/clog(ni//c)) 




o(n2/clog(ni//c)) 


o(^klog{pn,/k)) 


Constant depends on H 




Yes 


Yes 


No 



TABLE I: Measurement bounds for random sampling schemes: dense, uniform and decorrelating, and 
for recovery approaches: BPDN and SS-^i (i.e. source separation based recovery using ( [T0| ) or (fTT])). The 
last row shows if the bounds for the SS-ii are sensitive to the conditioning of the mixing matrix H. 



A satisfies the RIP: 

m >ck log(ni//c). 

Unlike the previous measurement bounds for the non-decorrelating sampling schemes, here c is a fixed 
constant independent of the mixture matrix H. Consequently, the total number of CS measurements used 
for source recovery is: 

m > cpk log{ni/k)). (38) 

Note that, for a noiseless sampling scenario (e = 0) the minimization ( fTT) ) can be decoupled into p 
independent ii minimizations, each of them corresponding to a sparse recovery of a certain source. Now, 
if we assume that each source has exactly k' ^ k/ p nonzero coefficients, then a perfect recovery can 
be guaranteed as long as 5^' {A) < \f2 — 1 which, for a matrix A drawn form the previously-mentioned 
distributions, implies that m > ck' \og{n\lk') and consequently: 

m — pm > cklog(pni/k). (39) 

Comparing to ( [38] ) where m is roughly proportional to pk, here the measurement bound improves by 
a factor p and it is mainly proportional to the sparsity level k of all sources. 



D. Conclusions on the Theoretical Bounds 

Consider a multichannel data derived by the linear mixture ([7]) of p sources, each having a k'- 
sparse representation i.e. S is /c = pk' sparse. Table [l| summarizes the scaling-orders of the number 
of CS measurements sufficient for an exact data reconstruction for different noiseless random acquisition 
schemes and sparse recovery approaches. As we can observe, compressed sensing via source recovery 



using (10) once it is coupled with a proper CS acquisition {i.e.. Dense i.i.d. subgaussian A, or a random 



decorrelating sampling scheme as in sections III-B 1 and III-B3) leads to a significantly improved bound 
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compared to standard methods such as BPDN. More remarkably, the number of CS measurements turns 
out to be independent of the number 77.2 of channels. 

Finally note that the measurement bound for the source-separation-based reconstruction approach, 
which uses a non-decorrelating random compression matrix, depends on the conditioning of the mixture 
parameters via the constant factor 7' in ([36]). Therefore, when the columns of H are highly coherent, the 
condition number of H becomes relatively large, and so does 7^ This limitation can be circumvented 
thanks to the decorrelating acquisition scheme. 

V. Applications in Compressive Hyperspectral Imagery 

Compressed sensing is particularly promising for hyperspectral imagery where the acquisition procedure 
is very costly. This type of images can be approximated by a linear mixture model as in ^ where each 
spatial pixel is populated with a very few number of materials (i.e. sources). In this regard, S G [0, 
is a matrix whose p columns are source images (vectorized 2D images) indicating the percentage of each 

material in one of the ni spatial pixels, and therefore 

p 

J^[S],,, = 1 VzG{l,...,ni}. (40) 

Moreover, H G R^^^^ is a matrix whose columns contain the spectral signatures of the corresponding 
sources of S. Note that in some particular applications and specially when the spatial resolution is high 
enough, the source images become disjoint, meaning that each spatial pixel contains only one material 
and [S],,^- G {0,1}. 

The two key priors that will be essential for compressive source identification are the following: i) 
Each source image contains piecewise smooth variations along the spatial domain, implying a sparse 
representation in a wavelet basis, or sparsity of its gradient, and ii) each spatial pixel is a non-negative 
linear combination of a small number of sources. 

In the next two sections we introduce two classes of source separation based recovery approaches that 
are particularly adapted to hyperspectral compressive imagery. 

A. Compressive HSI Source Separation via Convex Minimization 

According to our earlier assumptions, source images are spatially piecewise smooth, which means the 
coefficients of S = ^^20® are sparse in a 2-dimensional wavelet basis ^^2d ^ W^^^^K We conveniently 
rephrase this representation in a vectorized form S^ec = "^^vec with * = Id^ 0^^2d as described in 
Section HTbI 
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Taking into account the sparsity of Qvec and by incorporating specific assumptions such as ( |40l ) and 
non-negativity we can extend the £i minimization approach in ( fTO] ) as follows: 



argmin ||0x;ec||i (41) 


subject to \\y — A$*0^ec||2 ^ ^ 

Where, In denotes an all one n-dimensional vector. The first constraint is the same as the fidelity constraint 
in (fTOl). The last two constraints impose the element-wise non-negativity of S and the "percentage" 



normalization ( [40| ) i.e., each row of S belongs to the positive face of the simplex in R^. Minimizing the 
ii norm together with the last two constraints (that is equivalent to an additional ii norm constraint) 
gives solutions that contain both desired sorts of sparsity: i) along the 2D wavelet coefficients of S and, 
ii) along each row of S. 



Note that the theoretical analysis given in Section IV-C can also apply here to bound the performance 



of ( |4T] ). Although we bound the error similarly as for ( fTO] ), one can naturally expect a much better 



performance for ( |4T] ) thanks to the two additional constrains. 



Alternatively, problem ( [41] ) can be formulated in a more general "analysis" formulation with an analysis 
sparsity prior V{S): 

argmin V{S) (42) 
s 

subject to \\y - A^S 

^vec ^ 0- 

which is equivalent to ( |4T] ) when 7^(S) = ||**S^ec||i and * is a square and invertible operator. Another 
efficient analysis prior for image regularization is the Total Variation which can be applied on each 
source image of the HSI with the prior: P(S) = J2j W^jWrv- The problem formulation ( |42l ) is general 
and includes the decorrelating schemes discussed in sections |III-B1| and |III-B3[ Indeed inserting the 
matrix A of ([12]) in ( [42] ) leads to the following fidelity term \\y — Ap S^edb < ^ while the other terms 
remain unchanged. 

In the next Section we provide an iterative algorithm for solving problem ([42]). When sources are 



August 23, 2012 



DRAFT 



19 



disjoint, it is also possible to add a hard thresholding post-processing step that sets the maximum 
coefficient of each row of S equal to one and set to zero the other coefficients. 

B. The PPXA Algorithm for Compressive Source Separation 

The Parallel Proximal Splitting Algorithm (PPXA) pT| is an iterative method for minimizing an 
arbitrarily finite sum of lower semi-continuous (l.s.c.) convex functions. Each of the iteration consists in 
computing the proximity operator of all functions (which can be done in parallel), averaging their results 
and updating the solution until convergence. The proximity operator of a function /(x) : ^ R is 
defined as proXf :W 

argmin f(x) + -||x — x\\2. (43) 



For solving ( [42| ) with PPXA, we rewrite it as the minimization of the sum of three l.s.c. convex 
functions: 

argmin /^(S) + /2(S) + /3(S), (44) 
s 

with /i(S) = 'P(S), /2(S) = ^i32(S) and /3(S) = ^/3a+(S) and where ic is the indicator function of a 
convex set C defined as: 

'0 if S G C 

+00 otherwise, 



and the convex sets B2^Ba+ C R^i><^ are respectively, the set of matrices that satisfy the fidelity 
constraint \\y — A^S^edb ^ the set of matrices whose rows belong to the standard simplex in 



R^. The template of the PPXA algorithm that solves ( |44| ) and hence ( |42l ) is given in Algorithm 1. We 
now derive the proximity operator of each function fi. Note that the definition of the proximity operator 
in ( [43] ) naturally extends for matrices by replacing the £2 norm with the Frobenius norm. 
For 7^(S) = ||**S^ec||i. a standard calculation shows that 

{prox^^)i = sign((**S^ec)0 • (|(**S^ec)i| - + , (45) 

which is the soft thresholding operator applied on the wavelet coefficients of S. The proximity operator 
of 7^(S) = J2j=i l|Sj||ry can be decoupled and computed in parallel for each of the p sources via an 
efficient implementation proposed by [22[ . By definition, the proximal operator of an indicator function 
ic(S) is the orthogonal projection of S onto the corresponding set C. The projection onto the standard 
simplex 5a+ can be done in one iteration using the method proposed by Duchi et al. |23|. For a general 
implicit operator L = the projector onto B2 can be computed using a forward backward scheme as 
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Algorithm 1: The Parallel Proximal Algorithm to solve (|42l) 



Input: A, 5, /3 > 0. 
Initializations: 

n = 0, So = ri,o = r2,o = r3,o e R"^><"^ 

repeat 

for (i = 1 : 3) do 

I Pi,n — P^OX^i^j^.(Ti^n) 

end 

Sn+l = + P2,n + Ps^n)/^ 

for (z = 1 : 3) do 
end 

until convergence; 



proposed in |24|. This projection usually has the dominant computational complexity of the algorithm 
because of costly sub-iterations. However if the decorrelating sampling scheme is used and L = is a 
tight frame (i.e., \/x G LUx = z/x for a constant z/), then according to the semi-orthogonal linear 



transform property of proximity operators pT| , the orthogonal projection onto B2 has the following 
explicit form: 



(prox„^,(S))^^^ = S„ec + Jw*r(l-^) , (46) 



with r = y — A^S 



p^vec 



C. Compressive HSI Source Separation via Iterative Hard Thresholding 

If the source images are disjoint, the following non-convex minimization can be alternatively used for 
recovering the sparse wavelet coefficients of the sources: 

argmin \\y — 

A^^@^ec\\l (47) 

e 

subject to ||0,,ec||o < ^ 

Off diag(0*0) = 

^@vec > 0. 
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Algorithm 2: The Iterative Hard Thresholding Algorithm to approximate solution of (|47l) 



Input: A, 7 = = and k. 

Initializations: 

n = 0, 0° ^W^^'P 

repeat 

1- Gradient descent: 

0^e+c' = 0^ec-7VF(0«) 

2- Hard thresholding: 
0-+i=Th,(0-+i) 

3- Orthogonal matrix procrustes: 
Update 9. : = ||0;.+i||^ 
Singular value decomposition: UTy = 

4- Simplex projection: 
0n+i ^ Project^^^(*2D0"+') 

until convergence'. 



where the operator Off diag(S) returns the off-diagonal elements of matrix B, and the ^0 norm constraint 
on &vec imposes the wavelet coefficients to be fc-sparse. The second constraint imposes the orthogonality 
of the wavelet coefficients which is a consequence of the source disjointness. The two last constraints 
are the same as in ( |4T] ). 

Despite its convex objective term, ( |47| ) has multiple non-convex constraints and is therefore a non- 
convex problem. We propose an algorithm similar to the Iterative Hard Thresholding (IHT) algorithm 



[ [T5| to approximate the solution of ( |47l ). At each iteration the current solution is updated by a gradient 
descent step followed by a hard thresholding step Th/e( ) that selects the k largest wavelet coefficients 
of @vec' In addition the three last constraints of ( |47l ) are applied sequentially: 

• First, a procedure inspired by the orthogonal matrix procrustes is applied to diagonalize 0*0. Let 
be a p X p diagonal matrix where for 1 < z < p we have 

||0||f 

Since for disjoint sources we have ||S||i7 = = then a good orthogonal matrix that would 



approximate and keeps the energy of the current estimate of each source image proportional to 
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that of the previous estimate would be UV^Q through the following singular value decomposition 

• Second, the current solution S = ^^20® is projected onto the standard simplex as in ||23l. 
The description of the this algorithm can be found in Algorithm 2. Note that the gradient of the 
objective functional F(0) = \\y - A^'^QyecWl i^- 

VF(0) = -(A^^y{y - A$*0,ec). (48) 

Using the decorrelating scheme, the objective function in ^4T) becomes F(0) = \\y — Ap'^QyecM with 
gradient : 

VF(0) = - lp*0.ec)- (49) 

The rest of Algorithm 2 remains unchanged. 

In the next section, we evaluate the performances of these algorithms on HSI. 

VI. Experiments 

In this section, we evaluate the ability of the methods presented in Section |V} (called "SS methods'' 
and summed up in table |ll]) to separate the sources and recover HSI in various scenarios: various noise 
levels (from noiseless to 10 dB SNR), various sampling ratios (from 771/(711/12) = 1/4 to 1/32 sampling 
rates), various sampling mechanisms (uniform and dense sampling), on two different HSI (Geneva and 
Urban). We also compare the SS methods with the classical methods for CS, such as the BPDN problem 
^ BPDN, the TVDN problem ^ TVDN, both solved with a Douglas-Rachford (DR) spHtting algorithm. 



A. Sampling Mechanism 

We used two different sampling schemes: i) the sensing matrix A is dense (and the methods imple- 
menting the decorrelation step cannot be applied), and ii) uniform sampling where the sensing matrix 
is block diagonal with identical blocks as in ([15]). In the latter, the decorrelation step can be applied as 



explained in section III-B 



So as to generate the random sampling matrices A and A that can be used in practical applications, we 
used the Random Convolution (RC) measurement scheme proposed by Romberg [[25| that convolves the 
image with a random pattern using few optical blocks. More remarkably, sampling matrices generated 
by RC are tight frames and thus for decorrelating schemes, they benefit from a closed form expression 
( |46l ) for computing prox^f^{-) that can massively accelerates the recovery procedure. 
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TABLE II: Description of the proposed SS methods. 



Method name 


Description 


SS-IHT 


Problem (|47]) solved with Algorithm 2 with gradient VF(0) of Eq. (|48]l. 


SS-11 


Problem ([42]) solved with Algorithm 1, with V{S) = ^*S^;ec i and prox^f^{-) computed using a 
forward-backward scheme as proposed in p4). 


SS-TV 


Problem p2]) solved with Algorithm 1, with V{S) = J2j=i W^jWtv and prox^f^{-) computed using 
a forward-backward scheme as proposed in |24|. 


SS-lHT-decorr 


Problem (|47) solved with Algorithm 2 with gradient VF(0) of Eq. (|49j. 


SS-ll-decorr 


Problem (|42|) solved with Algorithm 1, with ^(S) = ^*S^ec i, and prox^f^{-) computed with the 
closed form Eq. (|46|. 


SS-TV-decorr 


Problem ([42]) solved with Algorithm 1, with V{S) = X^^-i tv and prox^f^{-) computed with 

61}. 



0-TVDN (dense) 
-I— BPDN (dense) 
^ SS-TV (dense) 
V- SS-11 (dense) 
^-SS-IHT (dense) 
SS-TV (uniform) 
A- SS-11 (uniform) 
SS-IHT (uniform) 
SS-TV-decorr 
-^SS-ll-decorr 
K-SS-IHT-decorr - 




Subsampling ratio (m/n^n^) 




Sampling SNR (dB) 



(a) Reconstruction SNR vs. subsampling ratio (noiseless (b) Reconstruction SNR vs. sampling SNR (subsampling 
sampling) ratio : 1 / 1 6) 



Fig. 1: Geneva HSI reconstruction performance for different sampling mechanisms and recovery methods. 
Points with oc reconstruction SNR (exact recovery) are not plotted. 



B. The Geneva HSI 



We evaluate the different methods, for different sampling rates (Fig. 1(a)), and different noise levels 



(Fig. 1(b) ), on a HSI generated from a ground truth map image |^ of farms in a suburb of Geneva. The 



source spectra (i.e. columns of H) are chosen form the USGS digital spectral library ]26[.The HSI cube 
has spatial slices of the resolution N = 256 x 256 that are taken over J = 224 frequency bands. 

"^We acknowledge Xavier Gigandet and Meritxell Bach Cuadra for providing this ground truth map. 
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TABLE III: Source separation performance (Accuracy) of SS methods. Methods with the highest accuracy 
are highlighted in each column. 



Noise SNR 


+00 dB 


30 dB 


10 dB 


Sampling rate 


1/4 


1/8 


1/16 


1/32 


1/4 


1/8 


1/16 


1/32 


1/4 


1/8 


1/16 


1/32 


SS-IET (dense sampling) 


0.69 


0.61 


0.57 


0.48 


0.71 


0.6 


0.57 


0.48 


0.7 


0.6 


0.57 


0.48 


SS-ll(dense sampling) 


1.0 


1.0 


0.95 


0.81 


1.0 


1.0 


0.95 


0.8 


1.0 


0.98 


0.91 


0.73 


SS-TV(dense sampling) 


1.0 


1.0 


1.0 


0.92 


1.0 


1.0 


1.0 


0.91 


1.0 


1.0 


0.98 


0.88 


SS-1E.T( uniform sampling ) 


0.43 


0.38 


0.31 


0.25 


0.43 


0.37 


0.31 


0.26 


0.43 


0.37 


0.3 


0.26 


SS-ll(uniform sampling) 


0.97 


0.73 


0.45 


0.31 


0.95 


0.73 


0.48 


0.3 


0.96 


0.75 


0.42 


0.3 


S S - TVf uniform sampling ) 


1.0 


0.98 


0.9 


0.76 


1.0 


0.97 


0.89 


0.74 


1.0 


0.97 


0.88 


0.74 


SS-IHT-decorr 


0.98 


0.98 


0.96 


0.94 


0.99 


0.98 


0.96 


0.94 


0.98 


0.97 


0.95 


0.92 


SS-ll-decorr 


1.0 


0.99 


0.97 


0.92 


1.0 


0.99 


0.96 


0.91 


0.98 


0.95 


0.92 


0.87 


SS-TV-decorr 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


0.99 


0.98 


0.96 



1 ) Performance of the SS methods: Concerning the performance of the SS methods, we observe in 
Fig. [1] that: 

• The dense sampling scheme is always better than the uniform sampling scheme. 

• The decorrelated scheme is always better than dense sampling for Tl/-based and IHT methods, but 
is not for the ^1 -based method. 

• The decorrelating method SS-TV-decorr results in perfect reconstruction in the cases where the 
sampling ratio is higher or equal to 1/16 and performs better than all the other methods in all 
regimes, except in high noise of 10 dB SNR, where the dense approach SS-TV (dense sampling) 
performs slightly better. 

2) Comparison with Classical CS Methods: We observed that SS-TV-decorr always obtained 
significantly better results than the classical CS methods in all regimes. 



3) Source Reconstruction: We reported in Tab. Ill the source separation performance of the SS methods. 



Since source images are disjoint, the quality was measured by the source recovery accuracy indicating 
the percentage of correctly classified pixels in the spatial domain. The method SS-TV-decorr, based 
on TV regularization and decorrelation, which achieved the best performance for HSI reconstruction 
also obtain the best performance for source separation. Figure [2] illustrates the reconstructed sources of 
different SS methods for various sampling schemes (dense, uniform, decorrelating). 

C. The Urban HSI 

In order to evaluate different approaches on a real HSI, we consider the Urban HSI of size 256 x 
256 X 171 which was obtained from the site |27 | of the US Army Topographic Engineering Center. 
As the ground truth of this image (i.e., the true source images and their corresponding spectral 
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(a) True sources 



(b) SS-TV (dense sampling) 



} 



(c) SS-TV (uniform non-decorrelating sampling) 



-A 



(d) SS-TV-decorr (uniform decorrelating sampling) 




1 at 



(e) SS-IHT-decorr (uniform decorrelating sampling) 



Fig. 2: Estimated source images of Geneva HSI for different sampling schemes and recovery methods 
(subsampUng ratio: 1/16, noiseless sampling). 



signatures) is not available, we first separate the underlying sources using a blind source separation 
algorithm for fully-sampled HSI [ [T0| and later, use these separated sources, depicted in Fig. |3(a)[ as a 
reference. Figure [3] demonstrates the reconstructed sources of Urban using our proposed SS approaches 
based on convex minimization, for different noiseless sampling mechanisms (dense, uniform, uniform- 
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decorrelating) and for a fixed subsampling ratioj^ Moreover, Figure |4] siiows tiie reconstructed Urban 
HSI for a certain spectral band, using the source images estimated by the SS methods based on TV 
minimization (i.e., SS-TV and SS-TV-decorr). 

a) Results: Similar to our previous experiment, we observe that for a uniform (non-decorrelating) 
sampling scheme SS-TV has very poor recovery performance. Meanwhile, adding a decorrelation step 



results in a significant improvement in source recovery. As we can see in Figure |3(b)[ the estimated 
source images using SS-TV for a dense sampling scheme have better spatial resolutions, but are not as 
well separated as with the SS-TV-decorr method. 

b) Computational Performance: Decorrelation step massively decreases the computational complex- 
ity. SS-TV-decorr performs within 20 minutes whereas SS-TV for a dense sampling scheme requires 
more than 80 hours of computations! The classical BPDN and TVDN methods take between 3 to 14 
hours, as the corresponding or TV minimization runs over a large number of channels (rather than few 
underlying sources). We ran all the codes on a Mac Pro 2.26 GHz Intel CPU, 16 GB RAM computer. 

D. Conclusion on the Experiments 

The decorrelation step is of great benefit, and the proposed method SS-TV-decorr, based on TV 
regularization and decorrelation, outperforms significantly all the other methods for HSI reconstruction 
and source estimation for all tested SNRs and sampling rates. Moreover SS-TV-decorr is clearly the 
fastest method and is more than 40 times faster than the classical TVDN. 

While finalizing this work we became aware of a recent paper [ [28| that proposes a source recovery 
approach similar to (|42]), albeit for the particular case of uniform sampling and TV regularization. The 
authors also use a "SVD preprocessing" step for dimensionality reduction and denoising that, contrary to 
our decorrelation step, does not cancel the effects of the conditioning of the mixing matrix. Comparing 
the fourth source image in Figure |3(e)[ corresponding to the "roads", with the similar source recovered 



in Figure 6 in [ |28| | indicates that SS-TV-decorr achieves similar or better separation performance 
with only half the measurements rate. Additionally we provide a theoretical analysis for the compressive 
source separation problem, considering various sampling schemes and multiple recovery methods. 

VII. Conclusion 

In this paper, we exploited a linear mixture of sources model into a Compressed Sensing (CS) scheme 
for multichannel signal acquisition and source separation with a particular focus on hyperspectral images 

^As the source images of Urban are not spatially disjoint, we do not apply Algo. 2. 
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(HSI). We study three different acquisition schemes (dense, uniform and decorrelated) theoretically and 
experimentally, and showed that the decorrelating scheme enhances drastically the recovery of the spectral 
data and its sources. Indeed, our theoretical analysis showed that, using this scheme, and contrary to the 
traditional CS approach, the number of measurements does not scale with the number of channels and does 
not depend on the conditioning of the mixing matrix, as long as the mixed spectra are linearly independent. 
This leads to a strong reduction in the number of needed measurements for a given reconstruction error. 
We also provided algorithms that reconstruct the multichannel signal (more particularly HSI) and its 
sources, by exploiting both sparsity of the signal at each channel and the correlation of the signals along 
the channels. We provided experiments on HSI and showed that we can reconstruct both the HSI and 
its sources with far fewer measurements and less computational effort than traditional CS approaches. 
Finally, we showed that it is possible to accurately recover the sources directly from the compressed 
measurements, avoiding to run a source separation algorithm on the high-dimensional raw data. 

Extension of this work includes dealing with non-linear mixture of sources as well as dealing with the 
difficult problem of recovering the sources and the mixing system from the compressed measurements. 

VIII. Appendix 

A. Proof of Theorem [7] 

Let G be the original vector we aim to recover from its CS measurements y with y — ADO + z 
and II z II 2 < s, and let 9 be the solution of the li minimization ([18]). The reconstruction error is denoted 
h = 6 — 9. Let To ^ {0, . . . , d} be the set that contains the indices of the k coefficients of 9 having 
the largest magnitudes and, Tq the complement set of To- Let 9j' denote a vector of the same size as 9 
whose elements indexed by the set T are identical to that of 9 and zero elsewhere. 

Minimizing the ii norm in ( fTS) ) implies 

ll^lli > \\0 + h\\i 

= ll^ro + /^rolli + ll^ro^ + /^roHli 

> ll^rolli-||/^rolli-||^roHli + IIHHIi, 

and therefore, 

IIHHIi < \\hTo\\i + 2\\er„4i- (50) 

Let 7i be the set that contains the indices of the rk coefficients of 9^^^ having the largest magnitudes, 
72 the set containing the indices of the second rk largest coefficients of 6^7^^ and so on. With this 
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decomposition, Vj > 2 we have: 

\\hT,h<{Tk)-''^\\hTa\l^ 

and thus, 

J>2 



Now according to ( [50| ) and since hqi is fc-sparse we have 



J]||/i7^-||2<r-V2||/i^J|2 + 2(rfc)-V2||^^^.||,. (51) 
i>2 



On the other hand, since both 6 and 6 satisfy the fidelity constraint of ( fTS] ), we have 

WADhh < \\y - Aiye\\2 + ||y - ADOh < 2e. 
Let's define 7oi := To U 7i and 7 = r + 1. According to the last inequality we can write 
2e > pD/i||2 

> \\ADhTj\2-Y.\\^^Hh 

i>2 



> Ji- s;, ||DH. II2 - ^1 + E IID^T^ II2 

i>2 



i>2 



> C,kiD)^l-S;,\\hrJ\2- W.fc(D)^l + 5;,(r-V2||/,^J|2 + 2(rfc)-V2||^^^.||^j. 

The third inequahty follows from definition of the D-RIP (see Definition [2]) which holds for the matrix A, 
together with the fact that and hj. (Vj > 2) are respectively and rk sparse. The fourth inequality 
follows from the definition of the A-RIP that holds for matrix D (see Definition [T]), and finally the last 
inequality uses ( |5T] ). We apply the bounds 5*^ < 5*^, Urk(D) < i^jk(D) and ||/i%||2 < II^TLilb in the 
last inequality and we deduce the following bound: 

\\hTo,h<cxk-'^H0%4i + /3^^ (52) 
where the constants a, /? are a = , ^ , and /3 = 2ZY-,.(DVr(i+^^ ^ 

Now if we set r > 2^^^(D) (equivalently, 7 > 1 + 2^^^(D)), it is sufficient to have 5*^ < 1/3 so that 
a and /3 remain positive. Finally we conclude the proof of Theorem [T] by using the inequalities ( |5T] ) and 
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(52) to bound the whole error term as follows: 



\\hh < l|/^r„J|2 + J]||/i7^-||2 



i>2 



< (i + r-'/')\\hrJ\2 + 2{Tk)-'/'\\erAi 



< c'ok-'/^\\er,4i + c'ie, 



where, the constants of the error bound are Cq = o: + (2 + a)r and c[ = + r 



l/2y 
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(a) Reference: Sources estimated with a BSS algorithm. 















(b) SS-TV (dense sampling), source reconstruction SNR: 6.34 dB 





(c) SS-11 (dense sampling), source reconstruction SNR: 6.29 dB 













(d) SS-TV (uniform non-decorrelating sampling), source reconstruction SNR: 1.88 dB 















(e) SS-TV-decorr (uniform decorrelating sampling), source reconstruction SNR: 8.64 dB 



(f) SS-11- decorr (uniform decorrelating sampling), source reconstruction SNR: 5.65 dB 

Fig. 3: Estimated source images of Urban HSI using different recovery methods {i.e., TV or wavelet 
minimization), and for different sampling mechanisms (subsampling ratio: 1/8, noiseless sampling). 
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Subsampling ratio 4:1 Subsampling ratio 8:1 Subsampling ratio 16:1 




Reconstruction SNR 26.44 dB Reconstruction SNR 22.04 dB Reconstruction SNR 1 8.59 dB 




Reconstruction SNR 10.36 dB Reconstruction SNR 8.547 dB Reconstruction SNR 8.185 dB 




Reconstruction SNR 1 9.84 dB Reconstruction SNR 1 6.33 dB Reconstruction SNR 1 4.32 dB 



Fig. 4: Reconstructed Urban HSI at spectral band 33, using SS methods based on TV minimization, 
for various sampling mechanisms (Dense, Uniform non-decorrelating. Uniform decorrelating) and 
subsampling ratios. 
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